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Abstract 



We consider a finite-state memoryless channel with i.i.d. channel state and the 

Y^ • input Markov process supported on a mixing finite- type constraint. We discuss the 

asymptotic behavior of entropy rate of the output hidden Markov chain and deduce 

that the mutual information rate of such a channel is concave with respect to the 

^ ■ parameters of the input Markov processes at high signal-to-noise ratio. In principle, 

^T , the concavity result enables good numerical approximation of the maximum mutual 

vQ I information rate and capacity of such a channel. 

^, 
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O ■ 1 Channel Model 



In this paper, we show that for certain input-restricted finite-state memoryless channels, 
the mutual information rate, at high SNR, is effectively a concave function of Markov input 
rS I processes of a given order. While not directly addressed here, the goal is to help estimate the 

c^ ' maximum of this function and ultimately the capacity of such channels (see, for example, 

the algorithm of Vontobel, et. al. [H]). 

Our approach depends heavily on results regarding asymptotics and smoothness of en- 
tropy rate in special parameterized families of hidden Markov chains, such as those developed 
in [5], [9], [3], [1], and continued here. 

We first discuss the nature of the constraints on the input. Let X he a finite alphabet. 
Let Af" denote the set of words over X of length n and let X* = U^A*". A finite-type 
constraint 5 is a subset oi X* defined by a finite list J-" of forbidden words [3 E]; in other 
words, S is the set of words over X that do not contain any element in J-" as a contiguous 
subsequence. We define Sn = <S H X"-. The constraint S is said to be mixing if there exists 
A^ such that, for any u,v ^ S and any n > N, there is a w G iS„ such that uwv G S. 

In magnetic recording, input sequences are required to satisfy certain constraints in order 
to eliminate the most damaging error events [8]. The constraints are often mixing finite- 
type constraints. The most well-known example is the {d, /c)-RLL constraint S{d, k), which 



forbids any sequence with fewer than d or more than k consecutive zeros in between two I's. 
For S{d, k) with /c < oo, a forbidden set J-" is: 

J^ = {1 ■ • • 1 : < / < 4 U {0 • ■ ■ 0}. 

I k+l 

When k = oo, one can choose J-" to be 

J^= {10---01 :0</< 4; 

I 

in particular when d = l,k = oo, T can be chosen to be {11}. 

The maximal length of a forbidden hst J-' is the length of the longest word in J-". In 
general, there can be many forbidden lists J-" which define the same finite type constraint S. 
However, we may always choose a list with smallest maximal length. The (topological) order 
of S is defined to be rh = 7h{S) where m + 1 is the smallest maximal length of any forbidden 
list that defines S (the order of the trivial constraint X* is taken to be 0). It is easy to see 
that the order of S{d, k) is k when k < oo, and is d when k = oo; S{d,k) is mixing when 
d < k. 

For a stationary stochastic process X over X, the set of allowed words with respect to X 
is defined as 

AiX) = {w\ : n > 0, P(X°„ = w'^J > 0}. 

Note that for any m-th order stationary Markov process X, the constraint S = A{X) is 
necessarily of finite-type with order m < m, and we say that X is supported on S. Also, X 
is mixing iff S is mixing (recall that a Markov chain is mixing if its transition probability 
matrix, obtained by appropriately enlarging the state space, is irreducible and aperiodic). 
Note that a Markov chain with support contained in a finite-type constraint S may have 
order m < rh. 

Now, consider a finite-state memoryless channel with finite sets of channel states c & C, 
inputs X E X, outputs z E Z and input sequences restricted to a mixing finite-type constraint 
S. The channel state process C is assumed to be i.i.d. with P{C = c) = qc- Any stationary 
input process X must satisfy A{X) C S. Let Z denote the stationary output process 
corresponding to X; then at any time slot, the channel is characterized by the conditional 
probability 

p{z\x, c) = P{Z = z\X = x,C = c). 

We are actually interested in families of channels, as above, parameterized by e > such 
that for each x, c, and z, p{z\x, c){e) is an analytic function of e > 0. We assume that for all 
X, c, z, p{z\x, c){e) is not identically as a function of e, so that for small e > 0, for any input 
X and channel state c, by analyticity, any output z can occur. We also assume that there 
is a one-to-one (not necessarily onto) mapping from X into Z, z = z{x), such that for all c 
and X, p{z{x)\x,c){Qi) = 1; so, e can be regarded as noise, and z{x) is the noiseless output 
corresponding to input x. Note that the output process Z = Z{X,e) depends on the input 
process X and the parameter value e; we will often suppress the notational dependence on 
e or X, when it is clear from context. 

Prominent examples of such families include input-restricted versions of the binary sym- 
metric channel with crossover probability e (denoted by BSC(e)), the binary erasure channel 



with erasure rate e (denoted by BEC(£)), and some special Gilbert-Elliott Channels, where 
the channel state process is a 2-state i.i.d. process, with one state acting as BSC(£:) and the 
other state acting as BSC(/c£) for some fixed /c; see Section 3 of [4]. 
Recall that the entropy rate oi Z = Z{X,e) is, as usual, defined as 

H{Z) = lim HniZ), 

n— >-oo 

where 

The mutual information rate between Z and X can be defined as 

I{Z;X)= lim4(Z;X), 

n— >oo 

where 

UZ;X) = K^iZ) - ^if(ZO JXOJ. 

n + 1 

Given the memoryless assumption, one can check that the second term above is simply 
H{Zq\Xq) and in particular does not depend on n. 

Under our assumptions, if X is a Markov chain, then for each e > 0, the output process 
Z = Z{X, e) is a hidden Markov chain and in fact satisfies the "weak Black Hole" assumption 
of [5, where an asymptotic formula for H{Z) is developed; the asymptotics are given as an 
expansion in e around e = 0. In section [2], we further develop these ideas to establish 
smoothness properties of H{Z) as a function of e and the Markov chain input X oi a fixed 
order. In particular, we show that H{Z) can be expressed as G{X, e) + F{X, e) log(£), where 
G{X,6) and F{X,e) are smooth (i.e., infinitely differentiable) functions of e near for any 
first order X supported on S (in fact, F{X,e) will be analytic); the log(e) term arises from 
the fact that the support of X will be contained in a non-trivial finite-type constraint and 
so X will necessarily have some zero transition probabilities; this prevents H{Z) from being 
smooth in e at 0. 

In Section [31 we apply the smoothness results to show that for a mixing finite-type 
constraint S of order 1, and sufficiently small ^o > 0, for each < e < ^o; In{Z{e,X);X) 
and I{Z{X, e); X) are strictly concave on the set of all first order X whose non-zero transition 
probabilities are not "too small". This will imply that there are unique first order Markov 
chains Xn = Xn{e),Xoo = Xoo{e) such that X„ maximizes In{Z{X, e),X) and X^o maximizes 
I{Z{X,e),X). It will also follow that X„(e) converges exponentially to X^{e) uniformly 
over < e < Eq. In principle, the concavity result enables (via any convex optimization 
algorithm) good numerical approximation of Xn{e) and Xoo{£) and therefore the maximum 
mutual information rate over first order X. This can be generalized to m-th order Markov 
chains, and as m — )■ cxd, this maximum converges to channel capacity; furthermore it can be 
generalized to higher order constraints. 



2 Asymptotics of Entropy Rate 

2.1 Key ideas and lemmas 

For simplicity, we consider only mixing finite- type constraints S of order 1, and correspond- 
ingly only first order input Markov processes X such that A{X) C S (the higher order case 
is easily reduced to this). For such X with transition probability matrix 11, (X, C) is also a 
first order Markov chain, with transition probability matrix: 

Q{{x,c),{y,d)) = U^^yqd- 

For any z & Z, define 

^z{{x, c), {y, d)) = U^,yqdp{z\y, d). (1) 

Note that Vt^ implicitly depends on e through p{z\y, d). One checks that 



E^^ 



n. 



and 



p{z\ 



zez 



TlVly Vl, 



' ^^20 -L) 



(2) 



where vr is the stationary vector of Vt and 1 is the all I's column vector. 

For a given analytic function f{e) around e = 0, let ord(/(£:)) denote its order with 
respect to e, i.e., the degree of the first non-zero term of its Taylor series expansion around 
e = 0. Thus, the orders ord {p{z\x^ c)) determine the orders ord (p(2;°^J) and similarly orders 
of conditional probabilities oid {j){zq\zZ]^) ■ 

Example 2.1. Consider a binary symmetric channel with crossover probability e and a 
binary input Markov chain X supported on the (1, oo)-RLL constraint with transition prob- 
ability matrix 

1 — p p 
1 



n 



where < p < 1. Here there is only one channel state, and so we can suppress dependence 
on the channel state. The channel is characterized by the conditional probability 



p{z\x) = p{z\x){e) 



I — e a z = X 

e \i z ^ X 



Let Z be the corresponding output binary hidden Markov chain. Now we have 



^0 



;i-p)(i-^) 



pe 




,fii 



[l-p)e p{l-e) 
e 



The stationary vector tt = (l/(p+ l),p/(p+ 1)), and one computes, for instance, 

piz_2Z-iZQ = 110) = vrf^if^ifiol = ^"•^ e + 0(e^), 

l+p 



which has order 1. 



Let A4 denote the set of all first order stationary Markov chains X satisfying A{X) C S. 
Let Ms, S > 0, denote the set of all X G A^ such that p{w\) > 6 for all w^^ G 1S2. Note 
that whenever X G A^o, i-e-? ^(-^) = 5, X is mixing (thus its transition probability matrix 
n is primitive) since S is mixing, so X is completely determined by its transition probability 
matrix 11. For the purpose of this paper, however, we find it convenient to identify each 
X G A^o with its vector oi joint probabilities p = px on words of length 2 instead: 

p = px = {P{Xl, = w\) : w\ G ^2); 

sometimes we write X = X{p). 

In the following, for any parameterized sequence of functions fn,x{^) (^ is real or complex), 
we use 

fnAe) = die'') on A 

to mean that there exist constants C, /3i, /32 > 0, eo > such that for all n, all A G A and all 

0<\s\<eo, 

Note that fn,\{.^) = 0{e'^) on A implies that there exists Eq > Q and < p < 1 such that 
|/n,A(^)| < p" for all l^l < Sq, all a G a and large enough n. One also checks that a O^e"^)- 
term is unaffected by multiplication of an exponential function (thus polynomial function) 
in n and a polynomial function in 1/e; 

Remark 2.2. For any given fn,\{£) = ^(e"), there exists eo > and < p < 1 such that 
\gi{n)g2{l/e) fn,x{£)\ < p", for all |£:| < Eq, all A G A, all polynomial functions gi{n),g2{l/e) 
and large enough n. 

Of course, the output joint probabilities p(2;°„) and conditional probabilities p{zo\zZn) 
implicitly depend on p G A^o a-nd e. The following result asserts that for small e, the total 
probability of output sequences with "large" order is exponentially small, uniformly over all 
input processes. 

Lemma 2.3. For any fixed < a < 1, 

ord(p(^~^))>on 

Proof. Note that for any hidden Markov chain sequence zZn, we have 

-1 

P(^~n) = ^^Pi^-LcZl,) Yl P{Zi\Xi,Ci), (3) 

i=—n 

where the summation is over all (xl^, cl^). Now consider zZn with k = ord (p{zZn)) > an. 
One checks that for e small enough there exists a positive constant C such that p{z\x, c) < 
Ce for (x, c, z) with oid{p{z\x,c)) > 1, and thus the term Y[i=-nP(^i\-'^i'^i) ^^ ^^ ® ^^ 



upper bounded by C^e'', which is upper bounded by (7°"e"" for e < 1/C. Noticing that 
^^-1 ^-1 p{xZn, cZn) = 1, we then have, for e small enough, 



-n ' —n 



J2 p(^-n)<E E p(^:Lc:^)C""^""<l^r^""^^ 

ord {p{zZl))>an zZ\ xZ\,cZ\ 



which immediately implies the lemma. D 

Now for any 5 > 0, consider a first order Markov chain X G A^^ with transition prob- 
ability matrix IT (note that X is necessarily mixing). Let n*^ denote a complex "transition 
probability matrix" obtained by perturbing all entries of 11 to complex numbers, while sat- 
isfying ^ Y^y = 1 . Then through solving the following system of equations 



^c^c ^ ^c^ ^ 



n 
y 



one can obtain a complex "stationary probability" vr*^, which is uniquely defined if the 
perturbation of 11 is small enough. It then follows that under a complex perturbation of 11, 
for any Markov chain sequence x^_n, one can obtain a complex version of p{x^^) through 
complexifying all terms in the following expression: 

namely, 

in particular, the joint probability vector p can be complexified to p^ as well. We then use 
M.f{ri), 1] > 0, to denote the 77-perturbed complex version of A^^; more precisely, 

■M^iv) = {{P^iw°-i) ■ w°-i e 52)1 Wif -p\\ <V for somepG Ms}, 

which is well-defined if rj is small enough. Furthermore, together with a small complex 
perturbation of e, one can obtain a well-defined complex version p'^{z^^) oi p{z^j^) through 
complexifying ([T]) and ([2]). 

Using the same argument as in Lemma 12.31 and applying the triangle inequality to the 
absolute value of (|3]), we have 

Lemma 2.4. For any 6 > 0, there exists 77 > such that for any fixed < a < 1, 

E 1/(^1^)1 = 0(kr) on A^f(r^). 

ord (p'''{^Z„))>o™ 

By Lemma 12.31 and Lemma 12. 4[ we can focus our attention on output sequences with 
relatively small order. For a fixed positive a, a sequence zZn ^ ^" is said to be a-typical if 
OTd{p{zZn)) < an; let T" denote the set of all a-typical Z-sequences with length n. Note 
that this definition is independent of p G A^o- 



For a smooth mapping f{x) from M'^ to R and a nonnegative integer i, Dif denotes the 
-th total derivative with respect to x; for instance, 



Dsf =[^] and Dif 



dxi / , "" V dxidxj J ^j 



In particular, ii x = p E Aio or af = {p,e) G A^o x [0)1]; this defines the derivatives 
D^^p(zo\zZn) or Dl^p(zo\zZli). We shall use | ■ | to denote the Euclidean norm (of a vector or 
a matrix), and we shall use ||y4|| to denote the norm of a matrix A as a linear map under the 



Euclidean norm, i.e.. 



\Ax\ 
sup - — - 



It is well known that ||y4|| < \A\. 

In this paper, we are interested in functions of g = {p,e). For any smooth function / of 

g and n = (^1,77,2, ■ ■ ■ ,n\s2\+i) G Z\_^'^ , define 



/ 



in) _ ^'"'/ 



here |n| denotes the order of the n-th derivative of / with respect to q, and is defined as 

\n\ = ni+n2-\ h 'n-|52|+i- 

The next result shows, in a precise form, that for a-typical sequences 2^^, the derivatives, 
of all orders, of the difference between p{zQ\zZn) and p(zokln-i) converge exponentially in 
n, uniformly in p and e. For n < m,7h < 2n, define 

T:,m,rn = {i^-m, ^-rn) ^ ^™^' X Z^+'\zZl, = ^-l is a-typical}. 

Proposition 2.5. Assume n < m,rh < 2n. Given 5q > 0, there exists a > such that for 
any i 

\Dipizo\zZl) - D'pizo\zZi)\ = 0(^") on Ms, x T" ^. 



The proof of Proposition 12.51 depends on estimates of derivatives of certain induced maps 
on a simplex, which we now describe. Let W denote the unit simplex in r!'^'''''!, i.e., the set 
of nonnegative vectors, which sum to 1, indexed by the joint input-state space X x C. For 
any z G Z, Qz induces a mapping /^ defined on W by 

fz{w) = —^—r. (4) 

Note that Qz implicitly depends on the input Markov chain p G A^o and e, and thus so does 
fz- While wQz^ can vanish at e = 0, it is easy to check that for all w G W, lims^o fziw) 
exists, and so fz can be defined at £ = 0. Let Om denote the largest order of all entries of 
Qz (with respect to e) for all z G 2, or equivalently, the largest order oi p{z\x,c){e) over all 
possible x,c, z. 

For Eq, 5o > 0, let 

U5o,eo = {peMso^e^ [0,£o]}- 

7 



Lemma 2.6. Given So > 0, there exists Eq > and Cg > such that on Us^^eo /^^ '^^^ z ^ Z, 
\Dw]z\ < Ce/s^^'''' on the entire simplex W. 

Proof. Given 5q > 0, there exist Eq > and C > such that for any z E Z, w E W, we 
have, for all < e < eo, 

\wn,i\> Ce^^'. 

We then apply the quotient rule to establish the lemma. D 

For any sequence 2;Zj^ G Z^ , define 

Similar to (jl]), fi^-i induces a mapping /^-i on W by: 



W^.-i 1 



By the chain rule, Lemma 12.61 gives upper bounds on derivatives of f^-i . However, these 
bounds can be improved considerably in certain cases, as we now describe. A sequence 
zZn ^ ^^ is Z -allowed if there exists xljy G A{X) such that 

zZ]^ = zixZ]^) = iz{x^N),z{x_N+i), ■ ■ ■ , z{x^i)). 

Note that zz]^ is Z-allowed iff ord {p{zz]^)) = 0. 

Since n is a primitive matrix, there exists a positive integer e such that 11^ > 0. For any 
z E Z, let Iz denote the set of indices of the columns (x, c) of fiz such that z = z{x); note 
that Iz can be empty for some z E Z. 

Lemma 2.7. Assume that X G A^o- For any Z-allowed sequence zz]^ = z{xz]^) G Z^ 
(here xZ]^ ^ <S), if N > 2eOM, we have 

ord((fi,-^)(s,ti))=ord((fi,-^)(s,t2)), 

for all s, and any ti,t2 G h^i, o,nd 

ord((fi,-;^)(s,ti))<ord((fi,-=^)(s,t2)), 

for all s, and any ti G Iz_^, ^2 ^ h-i- 

Proof. Let s = (x_Ar_i, c_iv_i), t = (x_i,c_i) E X xC. Then 

fi^-i^(s,t) = P((X_i,C_i) = {x_i,C-i),ZZm = zZ]^\{X_N-l,C-N-l) = {x-n-i,C-n-i)) 

= p{{x_i, C„i), zZlf\ix^N-l, C^N-l))- 

It then follows that 

ord(fi^-i (s,t)) = ord (p((x_i,c_i), 2:1^1 (x_iv_i,c_iv_i))) = ord (p((x_iv_i, c_Ar_i),2;Z^, (x_i, c_i))). 



Since 



we have 



p((x_7v-i,c„^_i),2;_^, (x_i,c_i)) = Y^ p{ 



X 7V-1' '^-Af-l' ^-TVJ' 



-2 g-2 

-iV'^-iV 



ord (11^-1^ (s,t)) = mill ^ ord (p(zi|xi, Ci)), 



i=-N 

where the minimization is over all sequences {xZ%, cl^) such that a;ljy_^ G 5. 

Since n*^ > 0, there exists some xZn~^~^'^ such that x_Ar_i+e = X-N-i+e s^^d p{xZn-i^'^) > 
0, and there exists some xZl such that x_e = a;_e and p{xZl) > 0. It then follows from 
ord {p{z\x,c)) < Om that, as long as A^ > 2eOM, for any fixed t and any choice of order 
minimizing sequence (xl^(t), cl^(t)), there exist < io = io(t),Jo = Jo(t) < cOm such that 
z{xl(t)) = zl if and only if i > —N — 1 + iQif) and j < — 1 — Jo(^)- One further checks that, 
for any choice of order minimizing sequences corresponding to t, (xl^(t), cl^(t)), 

io(t) 

y^ Old {p{zi\xi{t),Ci{t))), 

i=-N 

does not depend on t, whereas jo(^) = if and only if z{x_i) = z_i. This immediately 
implies the lemma. 

D 



Example 2.8. (continuation of Example 12. ip 
Recall that 



l^n 



'l-p){l-e) pe 
l-e 



fii 



'l-p)e p{l-e) 
e 



First, observe that the only Z-allowed sequences are 00, 01, 10; then straightforward compu- 
tations show that 



\ l\\ Zq 



(1 - pf{l - ef + pe{l - e) p{l - p)e{l - e) 
{l-p){l-ef ve{l-e) 

{l-pfe{l-e)+pe'^ p{l - p){l - ef 
{l-p)e{l-e) p{l-ef 

{l-pfe{l-e)+p{l-ef p{l - p)e'^ 
(1 — p)s{l — e) pe^ 



Note that in the spirit of Lemma 12.7^ for each of these three matrices, there is a unique 
column, each of whose entries minimizes the orders over all the entries in the same row. 



Now fix A^ > 2eOM- For any w G W, let v = f^~i (w). Note that the mapping f^-i 

imphcitly depends on e, so v is in fact a function of e. If zZli ~ "'""^^ ^ ^ "^^ 
by Lemma 12.71 when e = 0, 



z[x_}^) G Z is Z-allowed, 



if and only if i ^ J^^ , 



• for each i = (x_i, c_i) G Iz_i, Vi = qc_i, which does not depend on w. 

Let q{z) G W be the point defined by q{z)(^x,c) = Qc for all (x, c) with z{x) = z and 
otherwise. If z~^^ is Z-allowed, then 



lim /^-i [w) = q{z_ 



i;. 



thus, in this limiting sense, at e = 0, f-i maps the entire simplex W to a single point 

— N 

q{z_i). The following lemma says that if zzl^_i is Z-allowed, then in a small neighbourhood 
of q{z-N-i), the derivative of f^-i is much smaller than what would be given by repeated 
application of Lemma 12.61 

Lemma 2.9. Given 5o > 0, there exists eo > and Cc > such that on Usq^eo, ^f ^~n-i ^■^ 
Z -allowed, then \Dy^f^-i \ < CcS on some neighbourhood of q{z-N-i)- 

Proof. By the observations above, for all w G W, we have 

where r{w) is a rational vector- valued function with common denominator of order (in e) 
and leading coefficient uniformly bounded away from near w = q{z^N-i) over all p E Aisg- 
The lemma then immediately follows. D 

2.2 Proof of Proposition 12.51 

We now explain the rough idea of the proof of Proposition 12. 5[ for only the special case 
i = 0, i.e., exponential convergence of the difference between p{zo\zZn) and p{zo\zZn_i)- Let 
N be as above and for simplicity consider only output sequences of length a multiple A^: 
n = uoN. We can compute an estimate of -D^/^o by using the chain rule (with appropriate 
care at e = 0) and multiplying the estimates on |-D^/ (-i+i)iv| given by Lemmas 12.61 and 12.91 

This yields an estimate of the form, \Dy^f^o \ < (Ae:^^^")" for some constants A and B, 

on the entire simplex W. If a is sufficiently small and zZn is a-typical, then the estimate 
from Lemma 12.91 applies enough of the time that /^o exponentially contracts the simplex. 
Then, interpreting elements of the simplex as conditional probabilities p((xj,Cj) = -IzZm)^ 
we obtain exponential convergence of the difference \p{zo\zZn) — ^(^okln-i)!' ^^ desired. 

Proof of Proposition \2.5[ For simplicity, we only consider the special case that n = n^N, m = 
rrioN, rh = m^N for a fixed A^ > 2eOM', the general case can be easily reduced to this special 
case. For the sequences zZln,zZ^, define their "blocked" version [^]l;'„o5 [-^ll^o by setting 

r 1 {i+l)N-l ■ , -, 1 r-1 4j + l)N-l ■ . . , -I , 

[Ai = ZiN ,« = -mo,-mo + l,--- ,-1, [Aj = Zji^ ,J = -"^o,-"^o + l,--- ,-!• 

Let 

where • denotes the possible states of Markov chain (X, C). Then one checks that 

p{zo\zZla) = W-i-m^zol (5) 

10 



and Wi-m satisfies the following iteration 

W(i+i) _^ = fz,+A'^i-m) -n<i<-l, 

and the following iteration (corresponding to the blocked chain [2;] 1^0) 

W(i+l)N-l,^m = f[z]X'^iN-l-m) - riQ < i < -1, (6) 

starting with 

Similarly let 

which also satisfies the same iterations as above, however starting with 

W_ri-l,-m =p((x_„_i,C_„_i) = -I^Z^"^). 

We say [z]Zng "continues" between [^]i_i and [z]i if [z]l_i is Z-allowed; on the other hand, 
we say [z]Znf^ "breaks" between [-zjj-i and [z]i if it does not continue between [z]i_i and [z]i, 
namely, if one of the following occurs 

1. [z]i^i is not Z-allowed; 

2. [z]i is not Z-allowed; 

3. both [z]i^i and [z]i are Z-allowed, however [z]*_i is not Z-allowed. 



Iteratively applying Lemma 12.61 there is a positive constant Ce such that 

|I^./hJ < C7f /.2^«", (7) 

on the entire simplex W. In particular, this holds when [z]Zng "breaks" between [z]i^i and 
[z]i. When [z]Zng "continues" between [z]i^i and [z]i, by Lemma [2^ we have that if e is 
small enough, there is a constant Cc > such that 

\D^flzu\<C,e (8) 

Now, apply the mean value theorem, we deduce that there exist C,i, —no < « < — 1 (here 
^i is a convex combination of W-iN-i-m and W-iN^i-m) such that 



\W-l m — U)-l~fh\ — I/m-i ("U^-noTV-l-m) — /m"^ {uJ-noN-l-m 

L J— rig L J— ^0 

-1 

^ 11 \\Dwf[z]Mi)\\\''^-noN-l,-m — W-noN-l-ml- 



i=-no 

-1 
-no 



Since zZn is a-typical, [z]zln^ breaks at most 3an times; in other words, there are at least 

(1/A^ — 3a)n i's corresponding to ([8]) and at most San z's corresponding to ([7]). We then 

have 

-1 

n P«,/h,(^^)1I <c(i/^-=^")"cf^v(i/^^=^°-''^°«")". (9) 



i=-no 
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Let ao = 1/(A^(3 + QNOm))- Evidently, when a < ao, 1/^ — 3a — QNOmo^ is strictly 
positive, we then have 



'^-1, 



0(e") on Ms, X T^ 



It then follows from ([5]) that 

\p{z,\zZl) - p{z,\zZ\;)\ = 6(5") on Ms, X T„V,^. 

We next show that for each k, there is a positive constant Ci^i such that 



(10) 






>|/ 



(11) 



here, the superscript ^''^ denotes the k-th order derivative with respect to q = {p,s). In fact, 
the partial derivatives with respect to p are upper bounded in norm by nl^lCigi. 
To illustrate the idea, we first prove ( ITIl) for \k\ = 1. Recall that 



Wi 



= p{{Xi,Ci) = ■ 

Let g be a component of g = {p,s). Then, 



p{{Xi,Ci 



P{Z-m) 



d_ fp{{xi,Ci),ziJ 
dq 



< 



P{Z-m) 
P{{Xi,Ci 



p{{xi,Ci),zlJ fi^p{{xi,Ci),zlJ 



p{zl_ 



p{zl 



^PiixuCi) 



p{{Xi,Ci),zl_ 




P\\Xij Cj 



ipi^'- 



dq 



p^z'i 



We first consider the partial derivative with respect to e, i.e., q = e. Since the first factor is 
bounded above by 1, it suffices to show that both terms of the second factor are mO{l/e) 
(applying the argument to both z^m ^i-nd z^j^ and recalling that n < m,m < 2n). We will 
prove this only for \-^p{z'Z^) /p{zZra)\^ with the proof for the other term being similar. Now 



p{zt 



/ J 9v^—mi ^-m)i 



(12) 



where 



i-l 



9\-^ -mi ^—m) 



p{x-m) n p(^i+ii^i) n ^*^^j) n pi^ji^j^^j 



j =—m 



j=-rn 



]=-m 



and the summation is over all Markov chain sequences x^m ^"^^ channel state sequences c\ 
Clearly, M-p{zj\xj,Cj)/p{zj\xj,Cj) is 0{l/e). Thus each ^g 



■mi ^—m/ 



is mO{l/e). Each 
^) is lower bounded by a positive constant, uniformly over all p G Ms,- Thus, each 



yK'^—mi ^—'tn 
de 



g{x_ln, c_Ij^) / g{x_l^, c_Ij^) is mO(l/£). It then follows from (lT2l) that ■S-p[z'Zm) / p{^ 



mO{l/e)^ as desired. For the partial derivatives with respect to p, we observe that -^p{x_rn)/p{x- 
and ■^p{xj+i\xj)/p{xj+i\xj) (here, g is a component of p) are 0(1), with uniform constant 
over all p G Ms,- We then immediately establish (fTTl) for |/i;| = 1. 
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We now prove (ITT!) for a generic k. 

Apply the multivariate Faa Di Bruno formula (for the derivatives of a composite func- 
tion) [H |6] to the function f{y) = 1/y (here, y is a function), we have for I with |/| 7^ 0, 

/(y)« =5^D(ai,a2,--- ,a,)(l/y)(y(-)/y)(y(-)/y)---(y(-)/y), 

where the summation is over the set of unordered sequences of non-negative vectors ai, 02, ■ ■ ■ , a t 
with ai + (12 + ■ ■ ■ + at = I and D{ai, 02, ■ ' ' > ^t) is the corresponding coefficient. For any /, 
define U = Y[\Ji h^-] and for any / ^ k (every component of / is less or equal to the corre- 
sponding one of k), define Cl = k\/{l\{k — /)!). Then for any fc, applying the multivariate 
Leibnitz rule, we have 



J2 E ClD{a,,---,3t) 



p{{xi, d), zij p{{xi, Ci), zl^f^-^^ P{zij^^'^ P{zlj 



{k-T) ( i \(ai) ( i \{at) 



^ ^ ^ p{zl^) p{{xi,Ci),zl^) pizl^) Pizl^) 

l<k ai+a2H \-at=l 

Then, similarly as above, one can show that 

pizlj^'^^/pizlj, p((x„ q), / „)('^)/p((x„ q), zlj = ml'^l0(l/el»^l), (13) 

which implies that there is a positive constant C^r, such that 



\w. 



(k) 



<nl%,./5l^T 



(k) 

Obviously, the same argument can be applied to upper bound |Wj-_^|. 
We next prove that, for each k, 

l^'^X-n. - ^2,-^1 = 0{en on Ms, X T^^,^. (14) 

Proposition 12.51 will then follow from ([5]). 

We first prove this for \k\ = 1. Again, let g be a component of g = {PiS). Then, for 
i = —1, —2, ■ ■ ■ , —no, we have 

—W(^i+i)N-l-m = —^ (g, WiN-l-Tn)^WiN-l-m + "^ (?) WiN-l-m), (15) 

and 

-K-W^i+l)N^l-m = -o (?) WiN-l-rhj-K-^iN-l-m + —^ (?) WiN-l-rh>- (16) 

Taking the difference, we then have 

^'W^(i+l)Af-l,-m — -K~^{i+1)N-1,-Th — —p. 1?) W^iAr-l.-mj ^ yliWiN-l-m) 

13 



ow oq 



[q, WiN-l-raJTT'^iN-l, 

OW oq 



{q,WiN-l-m) T— (g,ti;i7V_i-m, 



dq 



+ I -^ [q,WiN_i_rn>^-WiN-l- 

Ow oq 



dq 

m o 

OW 

9f[zh , 



d 



+ I -7; [q,WiN-l-ra)ir^iN~l-m " ., 

OW oq OW 



{q, UJiN-l-m) 'K-'i^iN-l-rr. 



This last expression is the sum of three terms, which we will refer to as Ti, T2 and T3. 
From Lemma [2.6[ one checks that for all [z\i G Z^ ^ w eW and g G U^^^, 



i,eo' 



dqdw 



{q,w] 



d'flzU 



dwdw 



{q,w] 



<C/e 



ANO^ 



(Here, we remark that there are many different constants in this proof, which we will often 
refer to using the same notation C, making sure that the dependence of these constants 
on various parameters is clear.) It then follows from the mean value theorem that for each 
i = -1,-2, ■■■ ,-no 

By the mean value theorem and flTTl) . 



-1,-m — WiM-l, 



T2 < {C/e^''^^'){nC,/E)\w,M^^,.^~w,N-i, 



And finally 



Thus, 



n< 



d 



^/w> / - 



dw 



WiN-l,-m] 



Oq oq 



d 

\-^W(^i+l)N-l-m — -^W(i+i)N-l-rh\ < 



df[z]. 



dw 

+ {l+nCi/e)Ce-^'''^^' 
Iteratively apply this inequality to obtain 



(g, WiN- 



1,-TriJ 



d 



d 



\TrWiN-l-m — -7^WiN-l-m\ 

Oq oq 



WiN-l,-m — WiM-i 



d 



d 



Oq Oq 



< 



n 



i=-nQ 



dflzu 



dw 



{q, WiN- 



1,—rn) 



d 



d 



dq dq 



-1 



+ n 

i=—rn 



df[z], 



j=-j 



dw 

df[zu 



hUliN-l^-rhi 



:i + nCi/£)C£-^^°" 



f^-noN-l,-m — 1^~noN-l,- 



dw 



(q, WiN- 



1,— »ny 



(1 + nCi/e)Ce '■'\W(^_j_i)N-l-m - W(-j-l)Ar-l -m| + 
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+ ••• + 



dfiz]-. 



dw 



{q,w. 



■N-l-m) 



[l + nCi/e)C£ 



-4:NOa 



\W-2N-l,-m — W_2N-1, 



+ {l+nCi/e)Ce- 



-4NOm 



W-N-l,~m — "U^-Af-l, 



(17) 



Now, apply the mean value theorem, we deduce that there exist ^i, —riQ < i < —j — 2 
(here C,i is a convex combination of w^iN~i~m and w_jiv_i _.„) such that 

\f\zr'~^(^-noN-l-m) - /m-J-2( 



]-2[W_noN-l, 




\'W(-j-l)N-l-m — 'U^(-j-l)Af-l,-m| 

— _[_[ \\Dwflz]i{(.i)\\\'W-noN-l,-m — UJ^noN-l,-m,\- 
i=~no 

Then, recall that an a-typical sequence zZn breaks at most 3an times. Thus there are at 
least (1 — 3a)n i's where we can use the estimate ([8]) and at most San i's where we can only 
use the weaker estimates (171). Similar to the derivation of (l9l), with Remark 1221 we derive 



that for any a < cto, every term in the right hand side of ( 1T71) is 0{e"') on J^Sq x T^mm ("^^ 
use (ITTl) to upper bound the first term). Again, with Remark 12.21 we conclude that 



dq 



0(£") on TW^o X T; 



•a 



which, by (|5]), implies the proposition for £ = 1, as desired. 

The proof of (lT4l) for a generic k is rather similar, however very tedious. We next briefly 
illustrate the idea of the proof. Note that (compare with ( 1T5|) . ( lT6l) for |fc| = 1) 



w 



(i+l)N-l,-m 



and 



w 



(k) 



dw 

df[z], 



-{q,w. 



\ (fc) 

iN-l-m)U!-j^_-^ 



•,(fc) 



+ others 



WiAf-i -rh)w>y_i _™ + others. 



'{i+l)N-l,-m Q^ 

where the first "others" is a linear combination of terms taking the following forms (below, t 
can be 0, which corresponds to the partial derivatives of / with respect to the first argument 
q): 



4];(g, w,N-i,-m)w\j^U^_,^ ■■■w. 



iN-l,-mi 



and the second "others" is a linear combination of terms taking the following forms: 



f(fc')/ 



,(ai) 



.idt) 



here k' ^ k, t < \k\ and |aj| < \k\ for all i. Using (ITT]) and the fact that there exists a 
constant C (by Lemma 12. 6p such that 



,(fc') 



\f}^];iq,WiN-i,-m)\ <C/e 
we then can establish (compare with flT7|) for \k\ = 1) 



4NOM\k'\ 



I (k) -{k) \ ^ 

7/? — 7/J \ 

l^(2+l)A^-l,-m "^(2+l)A^-l,-ml — 



dh 



dw 



-{q,w. 



iN-l,-rh) 



\'^iN-l,-m ^iN-l,-m 



+ others. 
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where "others" is the sum of finitely many terms, each of which takes the following form (see 
the j-th term of flT7|) for \k\ = 1) 



^Dt: 



'X'0{l/e^S') Y[ 



{q,WiN-l-rh) 



i=-j 



dw 



I (a) "(a) I /-, f 

\'^{-j-l)N-l-m ~ '"^(-i-l)Af-l -ml' ^^^ 



where \a\ < \k\, Dp is a constant dependent on k'. Then inductively, one can use the similar 
approach to establish that (ITSl) is 0(6:") on A^^^ x T^^^, which implies IHM for a generic 
k, and thus the proposition for a generic i. 

n 

2.3 Asymptotic behavior of entropy rate 

The parameterization of Z as a function of e fits in the framework of |1] in a more general 
setting. Consequently, we have the following three propositions. 

Proposition 2.10. Assume that p & A^o- ^or any sequence 2;°„ G Z"^^-^ , p((x_i,c_i) = 
■\zZn) and p{zo\zZn) are analytic around e = 0. Moreover, ord{p{zo\zZn)) < Om- 

Proof. Analyticity of p((x_i, c_i) = -j^;!^) follows from Proposition 2.4 in j4]. It then follows 
from p{zq\zZ\) = p((x_i, c_i) = ■\zZn)^zo'i- and the fact that any row sum of ilzo is non-zero 
that p{zo\zZn) is analytic with oid{p{zo\zZn)) < Om- Q 

Proposition 2.11. (see Proposition 2.7 in J^) Assume that p^ A^o- ^or two fixed hidden 
Markov chain sequences 2;°^, z°^ such that 

z\ = z\, ord (p(^:^|zzr')), ord {p{zZlV^Z:t')) < k 

for some n <m,m and some k, we have for j with 0<j<n — 4k — 1, 

P^^\zo\zZLm=p^^\zo\zZ'JiO), 
where the derivatives are taken with respect to e. 

Remark 2.12. It follows from Proposition 12. 1 ll that for any a-typical sequence zZn with a 
small enough and n large enough, ord {p{zo\zZn)) = ord {p{zQ\zZn_i)) 

Proposition 2.13. (see Theorem 2.8 in ^) Assume that p E M.q. For any k > 0, 

k fe+l 

/7(Z) = i7(Z)|,=o + 5Z^,^^ + $^/i^^log£ + 0(£^'+i), (19) 

i=i j=i 

where fj 's and Qj 's depend on H and qc (but not on e), the transition probability matrix of 
X. 
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The following theorem strengthens Proposition 12.131 in the sense that it describes how 
the coefficients /j's and gj^s vary with respect to the input Markov chain. We first introduce 
some necessary notation. We shall break Hn{Z) into a sum of Gn{Z) and Fn{Z) log(e) where 
Gn{Z) = Gn{p,s) and Fn{Z) = Fn{p,e) are smooth; precisely, we have 

Hn{Z) = Gn{p, e) + Fn{p, s) logE, 

where (ord {p{zo\zZn)) is well-defined since p{zo\zZn) is analytic with respect to e; see Propo- 
sition \2Am 

Fnip,e) = 5^-ord(p(zok:;i)M£j (20) 

and 

Gn{p,e) = J2-Pi^-n)^ogp°{zo\zZl), (21) 

z° 

where 

p\z,\z-_l)=p{z,\zZl)/e'^^^^^^^^^\^-"-^\ 

Theorem 2.14. Given (5o > 0, for sufficiently small Eq, 

1. On Usq^eq, there is an analytic function F{p,e) and smooth (i.e., infinitely differen- 
tiahle) function G{p,e) such that 

H{Z{p,e)) = G{p,e) + F{p,e)\oge. (22) 

Moreover, 



here fj 's and Qj 's are the corresponding functions as in Proposition \2.13\: 

2. Define F{f),e) = F{f),e)/e. Then F{f),e) is analytic on Us^^eo- 

3. For any £, there exists < p < 1 such that on Us^^eo 

\Di,Fnip,e)-Di^Fip,s)\<p^, 

|2?l,F„(p,e)-Dl/(p,£)|<p", 

and 

\Di,Gr.{p,s)-Di,G{P,e)\<p\ 

for sufficiently large n. 



k+l\ 
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Proof. 1) Recall that 



Hn{Z) = Y^ -p{z\) logp{zo\zj^] 



It follows from a compactness argument that Hn{Z) uniformly converges to H{Z) on the 
parameter space t/^o.eo ^^^ ^^J positive ^o- We now define 

H:iz)= Yl -piz-n)^ogpizo\zzi,y, 

here recall that T" denotes the set of all a-typical ^-sequences with length n. Applying 
Lemma [2731 we deduce that H^{Z) uniformly converges to H{Z) on Uso^eo ^^ well. 

By Proposition 12.101 p{zo\zZn) is analytic with ord{p{zo\zZn)) < Om- It then follows 
that for any a with < a < 1 (we will choose a to be smaller later if necessary), 



where 



and 



H:{Z) = G^M e) + F:ip, e) logs, 



F^ip, e)= Yl -o^d ipizoKl.Mz'L, 
2Zier«,2o 



G"Me)= Y -p(^-JlogP°( 



Zo\Z_n) 



2_ier^,^o 



The idea of the proof is as follows. We first show that Ff^{p, e) uniformly converges to a 
real analytic function F{p,e). We then prove that G°(p, e) and its derivatives with respect 
to {p,£) also uniformly converge to a smooth function G{p,e). Since H^{Z) uniformly 
converges to H{Z), F{p,e), G{p,e) satisfy fl22l) . The "Moreover" part then immediately 
follows by equating flT9l) and fl22|) to compare the coefficients. 

We now show that F^{p, e) uniformly converges to a real analytic function F{p, e). Now 



\F:ip,e)-F:^,{p,e)\ 



Y OTd{p{zo\zJJ)p{z\)- Y OTd{p{zo\z_l^_^))p{z\_^] 



-^Zi£TS,zo 



zZl_^&TS.^,zo 



E 



+ 



E 



Old {p{zq\z_,^))p{z_^_-^ 



Z-l(iTS,zZl_^(iT^+^,zo zZleTii,z_l_^^T^^^,zo; 



E 



+ 



E 



ord(p(zok_„_i)M^-n-i) 



^2Z^eT-,2_i_,er;^+„2o zzI<^ts,zzI_^&t^+^,zo, 



By Remark 12. 121 we have 



|F„°(p,E)-F„\i(p,f)| 



E 



ord(p(2;o|2„^))p(£„__i; 



^Zl€TS,zZl_^<^T^+^,z^ 






Applying Lemma [231 '^^ have 

\F:{p,e) - F:^Me)\ = 0(8^ on Ms,, 



(23) 



which imphes that there exists £o > such that Ff^{p,e) are exponentially Cauchy and thus 
uniformly converges on UsQ^eo ^^ ^ continuous function F{p,e). 

Let F^''^{p,e) denote the complexified F^{p,e) on {p,e) with p G -Mf^ivo) ^^^ kl ^ ^o- 
Then, using Lemma 12.41 and a similar argument as above, we can prove that 



F:^''iP,e) - F:fMe)\ = 0(kr) on Mliv. 



SoV'IO), 



(24) 



in other words, for some ?7o,eo > 0, -F°'^(p, e) are exponentially Cauchy and thus uniformly 
converges on all {p,e) with p G ■M'soiVo) ^^^ kl ^ ^o- Therefore, F{p,e) is analytic with 
respect to {p,e) on ^5o,£o- 

We now prove that (^"(p, £:) and its derivatives with respect to {p,e) uniformly converge 
to a smooth function G°'{p,e) and its derivatives. 

Although the convergence of G"{p, e) and its derivatives can be proven through the same 
argument at once, we first prove the convergence of G"(p, e) only for illustrative purpose. 

For any a, /3 > 0, we have 



I loga - log/3| < max{|(a - /3)//3|, |(a - (3)/a\}. 
Note that the following is contained in Proposition 12.51 {i = 0) 

\fizo\zZl,)-p%zo\zZl,_^)\ = die'') on Ms, x T„"_„^„^i. 



(25) 



(26) 



One further checks that by Proposition 12. 10^ there exists a positive constant C such that for 
e small enough and for any sequence zZn, 



p{zo\zZl,)>Ce''^\ 



and thus, 



P^zoKl) > Ce 



Ojv 



(27) 



Using (|25D, (126]), (I27D and Lemma EJl we have 



|G^(p,£)-G°^i(p,5) 



Y^ -p{z\)\ogp°{zQ\z_l)- Y^ -p{z\_^)\ogp°{zQ\z_l_^] 



z_leTs,zo 



E 



+ 



E 



2-n-lG'C+1.20 



p{z\_^)\ogp°{zQ\z_l] 



\zZl&TS ,zZl^l(iT^+l,zo z_l&S,zZl_^<^T^+^,zo^ 
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z^ 



+ 



E 



p{z\_^)\ogp°{zo\z_l_^] 



K^Zl&T-,zZl,_,&T-^„zo zZi^TS,zZl_^^T^+^,zoj 



< 



E 



-p(2;°„_i)(logp°(2;ok_^) - \ogp°{zQ\z_l-i)) 



:UT,'i,zZl_^&T^+^.zo 



+ 



E 



-V{z^-n-l)^O^V°{zQ\z_l^) 






< 5Z P(^-n-i) max 



P°(2:ok_;^) -p°{zo\zJ^_^) 



V°{zo\z_l_^) 



P°{zq\z_1) -p°{zq\z_1_^] 



P°{zq\z_1) 



+ 



Y^ -p{z\_^)\ogp°{zQ\z_l) 

zZl(iTS,zZl.^iT^+^,zo 



+ 






6(£") on M 
(28) 



5o5 



which imphes that there exists £0 > such that C^^ip.e) uniformly converges on f/^o.eo, then 
the existence of G(p, e) immediately follows. 

Apply the multivariate Faa Di Bruno formula [H E] to the function /(y) = log j/, we have 
for / with |/| 7^ 0, 

f{yf = J2 D{di, a2, ■ ■ ■ , S,W^/y)iy^'^^/y) ■ ■ ■ (y^'^^/y), 

where the summation is over the set of unordered sequences of non-negative vectors ai, 02, ■ ■ ■ , a ^ 
with ai + a2 + ■ ■ ■ + Sk = I and D(ai, 02, ■ ■ ■ ,ak) is the corresponding coefficient. Then for 
any m, applying the multivariate Leibnitz rule, we have 

{G-J"'\p,e)= Yl E-C'iiP^™"'n^°n)(logP°(^ok:^))« 



z^l^€T^,zor:<rn 



Y Yl Yl -CkD{ai, ■ ■ ■ , 4 

:l^eT^,zo \l\^0Xd:rn 0,^+3,2+- -+3^=1 



u^-f)(,o yizoKl)^'^^ p%z,\z-_iY'''') 



P KZo\z 



P KZq\z_ 



+ Y -P^'^\^-n)^O^P\Zo\zZl] 

zZl&T^,zo 



(29) 



We tackle the last term of fl29|) first. Using fl25l) and fl26l) and with a parallel argument 
obtained through replacing p{z'^^) , p{z'^^_-^) in fl28l) by p'''"^(z°„),p''™'-'(2°„_i), respectively, 
we can show that 



Y -p^'''\z\)\ogp%zo\zZl)- Y ~P^'''\z\_,)\ogp%zo\zZl_,] 



:i6r-,2o 



-n-l^'^n + l^^O 



0{en on Ms,xT:^^^^^„ 
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where we used the fact that for any z°„ and m, p^'^\z'tn) / p{z^n) is 0(nl™'l/£l™'') (see 
And using the identity 

aia2 ■ ■■an-f3if32 ■ ■ ■ /?„ = («i-/3i)«2 ■ ■ ■an + /3i(a2-/32)a3 ■ --ctn^ h/3i ■ ■ ■/3„_i(a„ 

we have 

p^izolzZ'J'"'^ p^izolzZlY"""^ p°(2;o|2;Z,Li)(«^i) 



m). 



-Pn) 



p°(2;o|2;Z^i)(»'=) 



p°{zo\z_l^_^) 



< 



P°{zo\z_^) p°{zo\z_l) p°(zok_^_i) 

p^izolzZlY''^ p°(2okZ^„i)(^i)\ p°(^okii)(''^) P°(^oki^)("'=) 



+ 



P'^[Zo\Z_J p"[Zo\Z_^_^) J P'-^Zq\Z_^) p"(Zo|2;-nj 

p°(zokiLi)("^) /p°(^oki^)("^) P°(;2;okiLi)("^)A p°(^oki^)(''^) p°(^oki^)(''^^ 



P°{zq\z_\_-^ 



+ 



p°(^oklLi)^"^^ p°(2;o|2;Z^_i)('^'=-) /p°(zokl^)("^^^ p\z^\z-_l_^)^''^^ 



+ • 



P l2;o|2_„_i, 



P°(2;ok-n-l) 



p°(^ok-,i,) 



P l^oF-„-iJ 



Now apply the inequahty 






ttl a.2 0L2 012 



< |/3i/(aia2)||ai - a2| + |l/a2||/3i - /32| 



we have for any 1 < i < k, 



p°{zo\zZ'J''^^ 



P°{zo\zZl,.,Y''^ 



P°[Zo\Z_ 



P°[Zo\Z_^_l) 



< 



P [Zo\z^ 



'lUa>) 



P°izo\z_l)p°izo\z_l_^) 



\p°(zo\zJJ~p°{zo\z_l_^)\ + 



P°izo\z_l_^) 



\p°{zo\z^ 



l\(ai). 



-p (2o|2-„_l 



\{^i)\ 



It follows from multivirate Leibnitz rule and (ITTl) that there exists a positive constant Cg 
such that 

|p(zok:;i)^"^l = |(w-i,-„fi.oi)('')| < ni'^iCa/ei^i, (30) 

and furthermore there exists a positive constant C2 such that for any zZn ^ -2", 



Combining ([27D, ([29D, ([30]) and ([M]) gives us 

\iG:Y^\p,e) - (G^+i)(™)(p,e)| = 6(e") on A^,,. 



(31) 



(32) 



This implies that there exists Eq > such that G"(p, e) and its derivatives with respect 
to {p,e) uniformly converge on Usg^eg to a smooth function G{p,e) and correspondingly its 
derivatives (Here, by Remark 12.21 Eq does not depend on m). 

2) It immediately follows from analyticity of F{p,e) and the fact that oid F{p,e) > 1. 
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3) Note that, 



FMe)-F:ip,e) = ^ -ord (p(2;okl^))p(£j. 



Apply the multivariate Leibnitz rule, then by Proposition l2.10l ( l30l) . ( TT3l) and Lemma [2731 
we have for any i, 



DiF^{p,e)-DiF:{p,e 



Y^ -ord {p{zo\z_l))Di^{p{zo\zJjp{zJj) 

It follows from ( HM and the Cauchy integral formula that 

\Di,F:^Me) - Di,F:ip,e)\ = 6(£") on Ms,, 

we then have 

\Di,F^+iip,E) - Di^F^{p,e)\ = 6(£") on Ms,, 

and thus 

|4,,F„(p,£) - /^l,F(p,£)| = 6(5") on Ms,, 

which imply that for any i, there exist eo > 0, < p < 1 such that on Usq^eq 

\Di,F^ip,E)-Di^Fip,e)\<p\ 

and further 

\DiMp,e)-DiJip,e)\<p\ 

for sufficiently large n. 
Similarly note that 

Gr,{p,e)-G^ip,e) = Yl -p{^-n)^ogp°{zo\zz'J. 

zZl^TS,zo 



()(£") on Mso- 



Then by fl30l) , fl3T|) , (1271) and Lemma I2.3[ we have for any i, 



S01 



Y D'^^^{-p{zZ'Mzo\zZl) \ogp"{zo\zZl)) = 0(5") on M 

zZl,^T^,zo 

which, together with fl32|) . implies that for any i, there exists £0 > 0, < p < 1 such that 

on Uso,eo 

\Di,Gn{p,e)-Di,G{p,e)\<p\ 
for sufficiently large n. 



n 



Remark 2.15. We don't know if G{p,e) is analytic or not with respect to {p,e). 
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3 Concavity of Mutual Information 

Recall that we are considering a parameterized family of finite-state memoryless channels 
with inputs restricted to a mixing finite-type constraint S. Again for simplicity, we assume 
that S has order 1. 

For parameter value e, the channel capacity is the supremum of the mutual information 
of Z{X,e) and X over all stationary input processes X such that A{X) C S. Here, we 
use only first order Markov input processes. While this will typically not achieve the true 
capacity, one can approach capacity by using Markov input processes of higher order. As 
in Section |2l we identify a first order input Markov process X with its joint probability 
vector p = px&M, and we write Z = Z{p,e), thereby sometimes notationally suppressing 
dependence on X and e. 

Precisely, the first order capacity is 

C\e) = sup I{Z;X) = sup {H{Z) - H{Z\X)) (33) 

and its n-th approximation 

Clis) = sup UZ; X) = sup (n^iZ) - -J^HiZWX^)) . (34) 

peM p£M V n + i J 

As mentioned earlier, since the channel is memoryless, the second terms in (1331) and flMl) 
both reduce to if (Zo|Xo), which can be written as: 

^ -p{x)^Y^p{c)p{z\x,c)\o<gY^p{c)p{z\x,c). 
x&x,z&z cec cec 

Note that this expression is a linear function of p and for all p it vanishes when e = 0. Using 
this and the fact that for a mixing finite-type constraint there is a unique Markov chain of 
maximal entropy supported on the constraint [TU], one can show that for sufficiently small 
£1 > 0, ^1 > and all < e < £i, 

C'^ie) = sup {K,{Z) - H{Zo\Xo)) > sup {H^{Z) - H{Zo\Xo)), (35) 

p<^Ms^ peM\Msj^ 

C\6) = sup {H{Z) - H{Zo\Xo)) > sup iH{Z) - H{Zo\Xo)). (36) 

P&Ms-^ p&M\Ms^ 

Theorem 3.1. There exist ^o > 0, ^o > such that for all < e < sq, 

1. the functions In{Z{p,e);X{p)) and I{Z{p,e);X{p)) are strictly concave on Aiso, with 
unique maximizing pn{e) andp^oi^)', 

2. the functions In{Z{p,e)]X{p)) and I{Z{p,e);X{p)) uniquely achieve their maxima on 
all of Ai at pn{e) andpoo{s); 

3. there exists < p < 1 such that 

\Pn{e) - Poo{e)\ <p". 
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Proof. Part 1: Recall that 

H{Z{f>,e)) = G{v,e) + F{f>,e){e\oge). 

By part 1 of Theorem 12. 141 for some eo > 0, ^o > , G(p, e) and F(p, e) are smooth on Usq^sq, 
and so 

VimDp{p,e) = DlGip,0) 

and 

limD|.F(p,£) = D|F(p,0), 

uniformly on p E J^So- Thus, 

lhnZ}|/7(Z(p,£)) = DlG{p,0) = DlHiZip,0)), 

again uniformly on Aiso- Since D'iH{Z{p,0)) is negative definite on A^^^ (see [3j), it fol- 
lows that for sufficiently small e, D^H{Z{p,e)) is also negative definite on A^^^, and thus 
H{Z{p,e)) is also strictly concave on J^Sq- 

Since for all e >0, H{Zo\Xo) is a linear function of p, /(Zdj*, £);X(p)) is strictly concave 
on A4so- This establishes part 1 for I{Z{p,e);X{p)). By part 2 of Theorem 12.141 for 
sufficiently large n {n > Ni), we obtain the same result (with the same Eq and ^o) ior 
In{Z{p,e);X{p)). For each 1 < n < Ni, one can easily establish strict concavity on Us„^e„ 
for some 5„,e„ > 0. 

Part 2: This follows from part 1 and statements 0351) and 0361) . 

Part 3: For notational simplicity, for fixed < e < ^o; we rewrite I{Z{p, e); X{p)), /„(Z(p, e); X{p)) 
as function f {p) , fnip) , respectively. By the Taylor formula with remainder, there exist 
r/i, r/2 £ -Mso such that 



fiPni^)) = fiPooie)) + D:pf\p^{e)){pn{e) - p^{e)) 

+ {Pn{e) -Poo{e)fDlf{r],){p4e)~p^{e)), (37) 

fn{Poo{e)) = fn{Pn{e)) + Dj;fn{pn{e)){poo{e) - Pn{e)) 

+ {pn{e)-Poo{e)YDlU'n2){Pn{e)-Poo{e)), (38) 

here the superscript T denotes the transpose. 
By part 2 of Theorem 13.11 

^p/(Poo(^)) = 0, D^Up^{e)) = Q. (39) 

By part 2 of Theorem 12.141 with ^ = 0, there exists < po < 1 such that 

\f{Poo{e)) - fniPooie))\ < p^, IfiPnie)) - fniPnim < Po' (40) 

Combining ([27]), ([2SD, dSSD, (SOD, we have 

\{Pn{e)-Po.{e)nDlf{m) + D%{v,)){Ue) - Po.{e))\ < 2p^. 
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Since / and fn are strictly concave on J^So, D'if{rji),D'ifn{rj2) are both negative definite. 
Thus there exists some positive constant K such that 

K\Ue)-Poo{e)\''<2p-,. 

This, together with part 1 of Lemma \2A\ imphes the existence of p. 

U 

Example 3.2. Consider Example 12.11 For sufficiently small e and p bounded away from 
and 1, part 1 of Theorem 12. 141 gives an expression for H{Z{p,e)) and part 1 of Theorem 13. II 
shows that I{Z{p,e)) is strictly concave and thus has negative second derivative. In this 
case, the results boil down to the strict concavity of the binary entropy function; that is, 
when e = 0, H{Z) = H{X) = —plogp — {l—p) log(l— p), and one computes with the second 
derivative with respect to p 

H"iZ)U=o = ---j^<-2. 
p 1 — p 

So, there is Eq such that whenever < e < ^o; H"{Z) < 0. 
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